Jupyter Notebooks

Major features

  • Online environment for running snippets
  • Combine hypertext, code and charts on the same page
  • Perfect for sharing snippets for teach something to others.
  • Don't need to have a local environment with all the languages and libs

Language support

Multiple languages are supported through the concept of kernels: interpreters that execute tiny scripts one by one, on demand, while maintaining a runtime environment. Basically a REPL that's called from the web UI. The list currently includes:

  • Python
  • R
  • F# (on Azure Notebooks)a
  • Julia/Scala/etc.

Data Science / Machine Learning

Python and R are also popular for data science and machine learning, so people made sure they integrate well with Jupyter Notebooks. This means that many objects render nicely on the Notebook UI:

  • Pandas DataFrames are rendered as tables
  • matplotlib charts are rendered as inline pictures

Scientific Python

For machine learning, 3 types of libraries always pop up:

  • Data Analysis: These are libraries that can load data from various sources, do various transformations, and compute basic statistics. Best Python example: Pandas
  • Machine Learning: These libraries implement machine learning algorithms. Best Python example: scikit-learn
  • Charting: Render various graphs and plots of our data. Best Python example: matplotlib

Example 1: Titanic dataset


In [21]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
# Make charts a bit prettier
plt.style.use('ggplot')

In [43]:
titanic = pd.read_csv('titanic.csv', sep = ',')

In [44]:
# What are the dimensions
titanic.shape


Out[44]:
(891, 12)

In [45]:
# What are the column names
titanic.columns


Out[45]:
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [46]:
titanic['MySum'] = titanic["Survived"] + titanic["Pclass"]

In [47]:
# What do the first few rows look like
titanic.head()


Out[47]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked MySum
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 3
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 2
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 4
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S 2
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S 3

In [48]:
# Let's x cleanup the data a bit
city_names =  {"C": "Cherbourg", "Q": "Queenstown", "S": "Southampton"} 
titanic["EmbarkedCode"] = titanic["Embarked"]
titanic["Embarked"] = titanic["EmbarkedCode"].apply(lambda value: city_names.get(value))

In [49]:
# Check if it worked
titanic.head()


Out[49]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked MySum EmbarkedCode
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN Southampton 3 S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 Cherbourg 2 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN Southampton 4 S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 Southampton 2 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN Southampton 3 S

In [50]:
#Tell matplotlib to render graphs inside this notebook
%matplotlib inline

In [53]:
# Let's create a contingency table
pd.crosstab(titanic.Pclass, titanic.Survived, margins = True)


Out[53]:
Survived 0 1 All
Pclass
1 80 136 216
2 97 87 184
3 372 119 491
All 549 342 891

In [30]:
# Let's do the same but as percentages
pd.crosstab(titanic.Pclass, titanic.Survived, margins = True).apply(lambda row: row/len(titanic))


Out[30]:
Survived 0 1 All
Pclass
1 0.089787 0.152637 0.242424
2 0.108866 0.097643 0.206510
3 0.417508 0.133558 0.551066
All 0.616162 0.383838 1.000000

In [57]:
titanic.groupby(["Sex", "Survived"]).count().unstack("Survived")["PassengerId"]


Out[57]:
Survived 0 1
Sex
female 81 233
male 468 109

In [31]:
# Let's create a stacked bar chart for sex vs. survivability 
titanic.groupby(["Sex", "Survived"]).count().unstack("Survived")["PassengerId"].plot(kind="bar", stacked=True)


Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0xfc76d2f668>

In [ ]:


In [64]:
titanic[titanic.Sex == 'female']


Out[64]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked MySum EmbarkedCode
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 Cherbourg 2 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN Southampton 4 S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 Southampton 2 S
8 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 2 347742 11.1333 NaN Southampton 4 S
9 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 0 237736 30.0708 NaN Cherbourg 3 C
10 11 1 3 Sandstrom, Miss. Marguerite Rut female 4.0 1 1 PP 9549 16.7000 G6 Southampton 4 S
11 12 1 1 Bonnell, Miss. Elizabeth female 58.0 0 0 113783 26.5500 C103 Southampton 2 S
14 15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 0 350406 7.8542 NaN Southampton 3 S
15 16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 0 248706 16.0000 NaN Southampton 3 S
18 19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31.0 1 0 345763 18.0000 NaN Southampton 3 S
19 20 1 3 Masselmani, Mrs. Fatima female NaN 0 0 2649 7.2250 NaN Cherbourg 4 C
22 23 1 3 McGowan, Miss. Anna "Annie" female 15.0 0 0 330923 8.0292 NaN Queenstown 4 Q
24 25 0 3 Palsson, Miss. Torborg Danira female 8.0 3 1 349909 21.0750 NaN Southampton 3 S
25 26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38.0 1 5 347077 31.3875 NaN Southampton 4 S
28 29 1 3 O'Dwyer, Miss. Ellen "Nellie" female NaN 0 0 330959 7.8792 NaN Queenstown 4 Q
31 32 1 1 Spencer, Mrs. William Augustus (Marie Eugenie) female NaN 1 0 PC 17569 146.5208 B78 Cherbourg 2 C
32 33 1 3 Glynn, Miss. Mary Agatha female NaN 0 0 335677 7.7500 NaN Queenstown 4 Q
38 39 0 3 Vander Planke, Miss. Augusta Maria female 18.0 2 0 345764 18.0000 NaN Southampton 3 S
39 40 1 3 Nicola-Yarred, Miss. Jamila female 14.0 1 0 2651 11.2417 NaN Cherbourg 4 C
40 41 0 3 Ahlin, Mrs. Johan (Johanna Persdotter Larsson) female 40.0 1 0 7546 9.4750 NaN Southampton 3 S
41 42 0 2 Turpin, Mrs. William John Robert (Dorothy Ann ... female 27.0 1 0 11668 21.0000 NaN Southampton 2 S
43 44 1 2 Laroche, Miss. Simonne Marie Anne Andree female 3.0 1 2 SC/Paris 2123 41.5792 NaN Cherbourg 3 C
44 45 1 3 Devaney, Miss. Margaret Delia female 19.0 0 0 330958 7.8792 NaN Queenstown 4 Q
47 48 1 3 O'Driscoll, Miss. Bridget female NaN 0 0 14311 7.7500 NaN Queenstown 4 Q
49 50 0 3 Arnold-Franchi, Mrs. Josef (Josefine Franchi) female 18.0 1 0 349237 17.8000 NaN Southampton 3 S
52 53 1 1 Harper, Mrs. Henry Sleeper (Myna Haxtun) female 49.0 1 0 PC 17572 76.7292 D33 Cherbourg 2 C
53 54 1 2 Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkin... female 29.0 1 0 2926 26.0000 NaN Southampton 3 S
56 57 1 2 Rugg, Miss. Emily female 21.0 0 0 C.A. 31026 10.5000 NaN Southampton 3 S
58 59 1 2 West, Miss. Constance Mirium female 5.0 1 2 C.A. 34651 27.7500 NaN Southampton 3 S
61 62 1 1 Icard, Miss. Amelie female 38.0 0 0 113572 80.0000 B28 None 2 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
807 808 0 3 Pettersson, Miss. Ellen Natalia female 18.0 0 0 347087 7.7750 NaN Southampton 3 S
809 810 1 1 Chambers, Mrs. Norman Campbell (Bertha Griggs) female 33.0 1 0 113806 53.1000 E8 Southampton 2 S
813 814 0 3 Andersson, Miss. Ebba Iris Alfrida female 6.0 4 2 347082 31.2750 NaN Southampton 3 S
816 817 0 3 Heininen, Miss. Wendla Maria female 23.0 0 0 STON/O2. 3101290 7.9250 NaN Southampton 3 S
820 821 1 1 Hays, Mrs. Charles Melville (Clara Jennings Gr... female 52.0 1 1 12749 93.5000 B69 Southampton 2 S
823 824 1 3 Moor, Mrs. (Beila) female 27.0 0 1 392096 12.4750 E121 Southampton 4 S
829 830 1 1 Stone, Mrs. George Nelson (Martha Evelyn) female 62.0 0 0 113572 80.0000 B28 None 2 NaN
830 831 1 3 Yasbeck, Mrs. Antoni (Selini Alexander) female 15.0 1 0 2659 14.4542 NaN Cherbourg 4 C
835 836 1 1 Compton, Miss. Sara Rebecca female 39.0 1 1 PC 17756 83.1583 E49 Cherbourg 2 C
842 843 1 1 Serepeca, Miss. Augusta female 30.0 0 0 113798 31.0000 NaN Cherbourg 2 C
849 850 1 1 Goldenberg, Mrs. Samuel L (Edwiga Grabowska) female NaN 1 0 17453 89.1042 C92 Cherbourg 2 C
852 853 0 3 Boulos, Miss. Nourelain female 9.0 1 1 2678 15.2458 NaN Cherbourg 3 C
853 854 1 1 Lines, Miss. Mary Conover female 16.0 0 1 PC 17592 39.4000 D28 Southampton 2 S
854 855 0 2 Carter, Mrs. Ernest Courtenay (Lilian Hughes) female 44.0 1 0 244252 26.0000 NaN Southampton 2 S
855 856 1 3 Aks, Mrs. Sam (Leah Rosen) female 18.0 0 1 392091 9.3500 NaN Southampton 4 S
856 857 1 1 Wick, Mrs. George Dennick (Mary Hitchcock) female 45.0 1 1 36928 164.8667 NaN Southampton 2 S
858 859 1 3 Baclini, Mrs. Solomon (Latifa Qurban) female 24.0 0 3 2666 19.2583 NaN Cherbourg 4 C
862 863 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0 0 17466 25.9292 D17 Southampton 2 S
863 864 0 3 Sage, Miss. Dorothy Edith "Dolly" female NaN 8 2 CA. 2343 69.5500 NaN Southampton 3 S
865 866 1 2 Bystrom, Mrs. (Karolina) female 42.0 0 0 236852 13.0000 NaN Southampton 3 S
866 867 1 2 Duran y More, Miss. Asuncion female 27.0 1 0 SC/PARIS 2149 13.8583 NaN Cherbourg 3 C
871 872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 1 11751 52.5542 D35 Southampton 2 S
874 875 1 2 Abelson, Mrs. Samuel (Hannah Wizosky) female 28.0 1 0 P/PP 3381 24.0000 NaN Cherbourg 3 C
875 876 1 3 Najib, Miss. Adele Kiamie "Jane" female 15.0 0 0 2667 7.2250 NaN Cherbourg 4 C
879 880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 1 11767 83.1583 C50 Cherbourg 2 C
880 881 1 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 1 230433 26.0000 NaN Southampton 3 S
882 883 0 3 Dahlberg, Miss. Gerda Ulrika female 22.0 0 0 7552 10.5167 NaN Southampton 3 S
885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.1250 NaN Queenstown 3 Q
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 Southampton 2 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN Southampton 3 S

314 rows × 14 columns


In [32]:
# Do the same graph, but only for people older than 18 years old
titanic[titanic.Age >= 18].groupby(["Sex", "Survived"]).count().unstack("Survived")["PassengerId"].plot(kind="bar", stacked=True)


Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0xfc77208588>

Example 2: Video Game Sales


In [33]:
games = pd.read_csv("videogames.csv", sep = ",")

In [34]:
games.head()


Out[34]:
Name Platform Year_of_Release Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count User_Score User_Count Developer Rating
0 Wii Sports Wii 2006.0 Sports Nintendo 41.36 28.96 3.77 8.45 82.53 76.0 51.0 8 322.0 Nintendo E
1 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 0.77 40.24 NaN NaN NaN NaN NaN NaN
2 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.68 12.76 3.79 3.29 35.52 82.0 73.0 8.3 709.0 Nintendo E
3 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.61 10.93 3.28 2.95 32.77 80.0 73.0 8 192.0 Nintendo E
4 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37 NaN NaN NaN NaN NaN NaN

In [ ]:


In [83]:
by_publisher = games.groupby("Publisher").agg({"NA_Sales": sum, 
                                               "EU_Sales": sum, 
                                               "JP_Sales": sum, 
                                               "Global_Sales": sum, 
                                               "Critic_Score": np.mean}) 
by_publisher["Nintendo"]


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
D:\lib\anaconda\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)()

pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)()

pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)()

KeyError: 'Nintendo'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-83-e61d4fcb07a1> in <module>()
      4                                                "Global_Sales": sum,
      5                                                "Critic_Score": np.mean}) 
----> 6 by_publisher["Nintendo"]

D:\lib\anaconda\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

D:\lib\anaconda\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

D:\lib\anaconda\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

D:\lib\anaconda\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

D:\lib\anaconda\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)()

pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)()

pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)()

KeyError: 'Nintendo'

In [69]:
by_publisher.columns


Out[69]:
Index(['NA_Sales', 'EU_Sales', 'JP_Sales', 'Global_Sales', 'Critic_Score'], dtype='object')

In [36]:
top_publishers = by_publisher.sort_values("Global_Sales", ascending = False)[0:15][["NA_Sales", "EU_Sales", "JP_Sales"]]
top_publishers


Out[36]:
NA_Sales EU_Sales JP_Sales
Publisher
Nintendo 816.97 419.01 458.15
Electronic Arts 599.50 373.91 14.35
Activision 432.59 215.90 6.71
Sony Computer Entertainment 266.17 186.56 74.15
Ubisoft 252.74 161.99 7.52
Take-Two Interactive 222.94 119.25 5.93
THQ 207.72 93.78 5.01
Konami Digital Entertainment 91.90 68.98 91.40
Sega 108.61 80.66 57.06
Namco Bandai Games 69.90 42.16 127.85
Microsoft Game Studios 157.43 68.64 3.30
Capcom 78.25 38.53 68.43
Atari 109.84 27.00 10.71
Warner Bros. Interactive Entertainment 81.51 51.06 1.10
Square Enix 48.29 32.41 50.09

In [37]:
top_publishers.plot(kind="bar", figsize=(12,5))


Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0xfc77316c50>

In [38]:
# And again, as a barplot
top_publishers.plot(kind="bar", stacked = True, figsize=(12,5))


Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0xfc7857cc88>

Running Jupyter Notebooks locally with Docker

Install Docker, either natively or with docker machine

If running Linux, MacOS or Windows 10, you can get Docker native at docker.com.

If you're running Windows 8 then you need Docker Toolbox

Open a docker console to verify that docker is running

Open Docker Quickstart Terminal and run the following:

$ docker ps
CONTAINER ID        IMAGE           COMMAND          CREATED          STATUS           PORTS            NAMES

You're probably seeing an empty list. This is ok, docker is running, you just don't have any container running.

Choose an image from the Jupyter official Docker

  1. Go here: https://hub.docker.com/u/jupyter/
  2. Pick one of the images named *-notebook. For example, for python+scikit-learn+matplotlib, pick jupyter/scipy-notebook

Create a new container

$ docker run -p 8888:8888 -v /home/jovyan/work --name jupyternb jupyter/scipy-notebook start-notebook.sh --NotebookApp.token=''

A bit about what this does:

  • docker run is used to run a new container
  • -p 8888:8888 tells docker to map the port 8888 from the container to the host machine (or docker-machine vm)
  • -v /home/jovyan/work tells docker to create a persistent volume for the directory where the notebooks are stored. Without this, all work will be lost when stopping the docker container.
  • --name jupyternb specifies the name of the container. Without it, docker will generate a random name
  • jupyter/scipy-notebook is the name of the image from the docker hub to run
  • start-notebook.sh --NotebookApp.token='': totally optional, but this is specific to the Jupyter Notebook Docker image and tells it to disable authentication. Otherwise, you would have to get the initial configuration token from the docker logs.

Accessing the container

If using docker native, the app will be available at http://localhost:8888.

If using docker-machine, you'll need to find out its IP first using docker-machine inspect default | grep IPAddress, usually 192.168.99.100. The app will be avaialble then at e.g. http://192.168.99.100:8888.

Starting the container

If the container is stopped (e.g after reboot), it can be started with:

$ docker start jupyternb

Good Luck!

You should now be able to upload or create notebooks, as well as datasets that can be loaded from the notebooks.

Note: All work will be persisted to the docker volume, but you are encouraged to keep your files separately anwyay. They can be downloaded by choosing File > Download as > Notebook from the menu.


In [ ]: